Search CORE

15 research outputs found

The CMS DBS Query Language

Author: Afaq Anzar
Guo Yuyi
Kuznetsov Valentin
Lueking Lee
Riley Daniel
Sekhri Vijay
Publication venue
Publication date: 11/05/2009
Field of study

The CMS experiment has implemented a flexible and powerful system enabling users to find data within the CMS physics data catalog. The Dataset Bookkeeping Service (DBS) comprises a database and the services used to store and access metadata related to CMS physics data. To this, we have added a generalized query system in addition to the existing web and programmatic interfaces to the DBS. This query system is based on a query language that hides the complexity of the underlying database structure by discovering the join conditions between database tables. This provides a way of querying the system that is simple and straightforward for CMS data managers and physicists to use without requiring knowledge of the database tables or keys. The DBS Query Language uses the ANTLR tool to build the input query parser and tokenizer, followed by a query builder that uses a graph representation of the DBS schema to construct the SQL query sent to underlying database. We will describe the design of the query system, provide details of the language components and overview of how this component fits into the overall data discovery system architecture

CERN Document Server

Job Life Cycle Management Libraries for CMS Workflow Management Projects

Author: Afaq Anzar
Cinquilli Mattia
Codispoti Giuseppe
Evans Dave
Farina Fabio
Foulkes Stephen
Jackson James
Kuznetsov Valentin
Metson Simon
Ryu Seangchan
Spiga Daniele
Vaandering Eric
Van Lingen Frank
Wakefield Stuart
Wilkinson Rick
Publication venue: 'IOP Publishing'
Publication date: 14/05/2009
Field of study

Scientific analysis and simulation requires the processing and generation of millions of data samples. These processing and generation tasks are often comprised of multiple smaller tasks divided over multiple (computing) sites. This paper discusses the Compact Muon Solenoid (CMS) workflow infrastructure, and specifically the Python based workflow library which is used for so called task lifecycle management. The CMS workflow infrastructure consists of three layers: high level specification of the various tasks based on input/output datasets, life cycle management of task instances derived from the high level specification and execution management. The workflow library is the result of a convergence of three CMS subprojects that respectively deal with scientific analysis, simulation and real time data aggregation from the experiment

CERN Document Server

The CMS Integration Grid Testbed

The CMS Integration Grid Testbed (IGT) comprises USCMS Tier-1 and Tier-2 hardware at the following sites: the California Institute of Technology, Fermi National Accelerator Laboratory, the University of California at San Diego, and the University of Florida at Gainesville. The IGT runs jobs using the Globus Toolkit with a DAGMan and Condor-G front end. The virtual organization (VO) is managed using VO management scripts from the European Data Grid (EDG). Gridwide monitoring is accomplished using local tools such as Ganglia interfaced into the Globus Metadata Directory Service (MDS) and the agent based Mona Lisa. Domain specific software is packaged and installed using the Distrib ution After Release (DAR) tool of CMS, while middleware under the auspices of the Virtual Data Toolkit (VDT) is distributed using Pacman. During a continuo us two month span in Fall of 2002, over 1 million official CMS GEANT based Monte Carlo events were generated and returned to CERN for analysis while being demonstrated at SC2002. In this paper, we describe the process that led to one of the world's first continuously available, functioning grids.Comment: CHEP 2003 MOCT01

arXiv.org e-Print Archive

Caltech Authors

CERN Document Server

Software packaging with DAR

Author: Afaq Anzar
Graham Greg
Lefébure V
Ratnikova Natalia
Wildish Tony
Publication venue: 'Elsevier BV'
Publication date: 01/01/2004
Field of study

One of the important tasks in distributed computing is to deliver software applications to the computing resources. Distribution after Release (DAR) tool, is being used to package software applications for the world-wide event production by the CMS Collaboration. This presentation will focus on the concept of packaging applications based on the runtime environment. We discuss solutions for more effective software distribution based on two years experience with DAR. Finally, we will give an overview of the application distribution process and the interfaces to the CMS production tools

CERN Document Server

Contextual Constraint Modeling in Grid Application Workflows

Author: Anzar Afaq
David Evans
Eric Wicklund
Gerald Guglielmo
Greg Graham
Peter Love
Publication venue
Publication date
Field of study

This paper introduces a new mechanism for specifying constraints in distributed workflows. By introducing constraints in a contextual form, it is shown how different people and groups within collaborative communities can cooperatively constrain workflows. A comparison with existing state-of-the-art workflow systems is made. These ideas are explored in practice with an illustrative example from High Energy Physics.

CiteSeerX

Design and early experience with promoting user-created data in CMS

Author: Anzar Afaq
E W Vaandering
Eric Vaandering
Evans D
M Giffels
Nicolo Magini
The CMS Collaboration
Publication venue: 'IOP Publishing'
Publication date
Field of study

Crossref

The CMS Dataset Bookkeeping Service

Author: Andrew Dolgert
Anzar Afaq
Chris Jones
Dan Riley
Lee Lueking
Sergey Kosyakov
Valentin Kuznetsov
Vijay Sekhri
Yuyi Guo
Publication venue
Publication date: 03/04/2020
Field of study

Abstract. The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems

CiteSeerX